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Abstract 

P\| , We study the problem of fitting circles (or circular arcs) to data 

points observed with errors in both variables. A detailed error analysis 
■ for all popular circle fitting methods - geometric fit, Kasa fit, Pratt 

fit, and Taubin fit - is presented. Our error analysis goes deeper 
than the traditional expansion to the leading order. We obtain higher 
order terms, which show exactly why and by how much circle fits differ 
from each other. Our analysis allows us to construct a new algebraic 
(non-iterative) circle fitting algorithm that outperforms all the existing 
methods, including the (previously regarded as unbeatable) geometric 
fit. 
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O 1 Introduction 



Fitting circles and circular arcs to observed points is one of the basic tasks 
in pattern recognition and computer vision, nuclear physics, and other areas 
[HI El HH EH EH E2l [30l [32]. Many algorithms have been developed that 
fit circles to data. Some minimize the geometric distances from the circle 
to the data points (we call them geometric fits). Others minimize various 
approximate (or 'algebraic') distances, they are called algebraic fits. We 
overview most popular algorithms in Sections [3HH 

Geometric fit is commonly regarded as the most accurate, but it can only 
be implemented by iterative schemes that are computationally intensive and 
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subject to occasional divergence. Algebraic fits are faster but presumably 
less precise. At the same time the assessments on their accuracy are solely 
based on practical experience, no one has performed a detailed theoretical 
comparison of the accuracy of various circle fits. It was shown in [8] that 
all the circle fits have the same covariance matrix, to the leading order, in 
the small-noise limit. Thus the differences between various fits can only be 
revealed by a higher-order error analysis. 

The purpose of this paper is to do just that. We employ higher-order 
error analysis (a similar analysis was used by Kanatani |22j in the context 
of more general quadratic models) and show exactly why and by how much 
the geometric circle fit outperforms the algebraic circle fits in accuracy; we 
also compare the precision of different algebraic fits. Section presents our 
error analysis in a general form, which can be readily applied to other curve 
fitting problems. 

Finally, our analysis allows us to develop a new algebraic fit whose ac- 
curacy exceeds that of the geometric fit. Its superiority is demonstrated by 
numerical experiments. 

2 Statistical model 

We adopt a standard functional model in which data points (xi, yi), . . . , (x n , y n ) 
are noisy observations of some true points (xi, y{), . . . , (x n , y n ), i.e. 

(1) x i = x i + 5 i , yi = yi + £i, i = l,..., n, 

where (Si,£i) represent isotropic Gaussian noise. Precisely, <Vs and £j's are 
i.i.d. normal random variables with mean zero and variance a 2 . 

The true points (xi,yi) are supposed to lie on a 'true circle', i.e. satisfy 

(2) {xi - ~a) 2 + (yi - b) 2 = R 2 , z = l,...,n, 
where (a, b, R) denote the 'true' (unknown) parameters. Therefore 

(3) Xi = a + Rcosipi, yi — b + Rsxntfii, 

where (fi, . . . , (p n specify the locations of the true points on the true circle. 
The angles (fx, . . . , cp n are regarded as fixed unknowns and treated as addi- 
tional parameters of the model (called incidental or latent parameters). For 
brevity we denote 

(4) Ui = cos (pi = - a)/R, v { = sin^ = (y { - b)/R. 
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Note that uf + vf = 1 for every i. 

Remark. In our paper Si and £j have common variance a 2 , i.e. our noise is 
homoscedastic. In many studies the noise is heteroscedastic [25, 35J, i.e. the 
normal vector (5»,£i) has point-dependent covariance matrix a 2 Ci, where Cj 
is known and depends on i, and a 2 is an unknown factor. Our analysis can 
be extended to this case, too, but the resulting formulas will be somewhat 
more complex, so we leave it out. 

3 Geometric circle fits 

A standard approach to fitting circles to 2D data is based on orthogonal 
least squares, it is also called geometric fit, or orthogonal distance regression 
(ODR). It minimizes the function 

(5) F(a,b,R) = J2dl 

where di stands for the distance from (xi,yi) to the circle, i.e. 

(6) di = Ti- R, Ti = \J (xi - a) 2 + (yi - b) 2 , 

where (a, b) denotes the center, and R the radius of the circle. 

In the context of the functional model, the geometric fit returns the max- 
imum likelihood estimates (MLE) of the circle parameters [6], i.e. 

(7) (d MLE ,b MLE , R MLE ) = argmin J 7 (a, b, R) . 

A major concern with the geometric fit is that the above minimization 
problem has no closed form solution. All practical algorithms of minimiz- 
ing JF are iterative; some implement a general Gauss-Newton P, [J5] or 
Levenberg-Marquardt [9] schemes, others use circle-specific methods pro- 
posed by Landau [21] and Spath [30]. The performance of iterative algo- 
rithms heavily depends on the choice of the initial guess. They often take 
dozens or hundreds of iterations to converge, and there is always a chance 
that they would be trapped in a local minimum of T or diverge entirely. 
These issues are explored in [§]. 

A peculiar feature of the maximum likelihood estimates (a, b, R) of the 
circle parameters is that they have infinite moments [7], i.e. 

(8) E(|a|) =E(|S|) =E(R) = oo 
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for any set of true values (a,b,R); here E denotes the mean value. This 
happens because the distributions of these estimates have somewhat heavy 
tails, even though those tails barely affect the practical performance of the 
MLE (the same happens when one fits straight lines to data with errors in 
both variables [21 [3] ) . 

To ensure the existence of moments one can adopt a different parameter 
scheme. An elegant scheme was proposed by Pratt [27] and others [13], which 
describes circles by an algebraic equation 

(9) A(x 2 + y 2 ) + Bx + Cy + D = 

with an obvious constraint A ^ (otherwise this equation describes a line) 
and a less obvious constraint B 2 + C 2 — 4 AD > 0. The necessity of the latter 
can be seen if one rewrites equation (jUj) as 

/ B\ 2 / B 2 + C 2 ~AAD 
< 10) (*"2a) + (y-2l) 4^ = °- 

It is clear now that defines a circle if and only if B 2 + C 2 — AAD > 0. 

As the parameters (A, B, C, D) only need to be determined up to a scalar 
multiple, it is natural to impose a constraint 

(11) B 2 + C 2 - 4AD = 1, 

because it automatically ensures B 2 + C 2 — A AD > 0. The constraint (fTTj) was 
first proposed by Pratt [27J. Under this constraint, the parameters A, B, C, D 
are essentially bounded, see [9], and their maximum likelihood estimates can 
be shown to have finite moments. 

The equation (jUJ), under the constraint ffTTj) . conveniently describes all 
circles and lines (the latter are obtained when A = 0); the inclusion of lines 
is necessary to ensure the existence of the least squares solution 0, (26J [37] . 

After one estimates the algebraic circle parameters A, B, C, D, they can 
be converted to the natural parameters via 

B C 2 B 2 + C 2 - 4AD 
12 a— -, b= -, R = — . 

v ' 2A' 2A' AA 2 



4 Algebraic circle fits 

An alternative to the complicated geometric fit is made by fast non-iterative 
procedures called algebraic fits. We describe three most popular algebraic 
circle fits below. 
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Kasa fit. One can find a circle by minimizing the function 



(13) 



= + y2 ~ 2aXi - 2byi + « 2 + &2 - r2 Y- 



In other words, one minimizes JF K = ffi where f\ = r 2 — R 2 is the so 
called algebraic distance from the point (xi,yi) to the circle. A change of 
parameters B = —2a, C = —2b, D = a 2 + b 2 — R 2 transforms (fT3"|) to a linear 
least squares problem minimizing 



where we denote Zi = xj+y 2 for brevity (we intentionally omit symbol A here 
to make our formulas consistent with the subsequent ones) . Now the problem 
reduces to a system of linear equations (normal equations) with respect to 
B, C, D that can be easily solved, and then one recovers the natural circle 
parameters a, b, R via f|T2|) . 

This method was introduced in the 1970s by Delogne [UJ and Kasa [23J, 
and then rediscovered and published independently by many authors, see 
references in [9]. It remains popular in practice. We call it Kasa fit. 

The Kasa method is perhaps the fastest circle fit, but its accuracy suffers 
when one observes incomplete circular arcs (partially occluded circles); then 
the Kasa fit is known to be heavily biased toward small circles jH] • The reason 
for the bias is that the algebraic distances fa provide a poor approximation 
to the geometric distances d{\ in fact, 



hence the Kasa fit minimizes JF K m 2R 2 ^2d 2 , and it often favors smaller 
circles minimizing R 2 rather than the distances di. 

Pratt fit. To improve the performance of the Kasa method one can minimize 
another function, T = jj^J-k, which provides a better approximation to 
df. This new function, expressed in terms of A, B, C, D reads 



(14) 




(15) 



fi = {n - R)( ri + R) = d,(2R + di) w 2Rd 



(16) 




due to ( 1T21) . Equivalently, one can minimize 



(17) T(A, B, C, D) = J2i Az * + Bx i + °Vi + D ? 



5 



subject to the constraint (11 II) . This method was proposed by Pratt [27J. 

Taubin fit. A slightly different method was proposed by Taubin [32] who 
minimizes the function 



Here we use standard 'sample means' notation: x = ^ Yl Xj, etc. 

General remarks. Note that the minimization of (fTTl) must use some con- 
straint, to avoid a trivial solution A = B = C = D = 0. Pratt and Taubin 
fits utilize constraints (fTTl) and (121}]) . respectively. Kasa fit also minimizes 
(|T7|) . but subject to constraint A = 1. 

While the Pratt and Taubin estimates of the parameters A, B, C, D have 
finite moments, the corresponding estimates of a, b, R have infinite moments, 
just like the MLE (JH]). On the other hand, Kasa's estimates of a,b,R have 
finite moments whenever n > 4; see |37j . 

All the above circle fits have an important property - they are indepen- 
dent of the choice of the coordinate system, i.e. their results are invariant 
under translations and rotations; see a proof in |14j . 

Practical experience shows that the Pratt and Taubin fits are more stable 
and accurate than the Kasa fit, and they perform nearly equally well, see [9]. 
Taubin [32j intended to compare his fit to Pratt's theoretically, but no such 
analysis was ever published. We make such a comparison below. 

There are many other approaches to the circle fitting problem in the 
modern literature [H [TOl EH [28l [29l EU [33l [36l EH], but most of them are 
either quite slow or can be reduced to one of the algebraic fits [HJ Chapter 8]. 

Matrix representation. We can represent the above three algebraic fits in 
matrix form. Let A = (A, B, C, D) denote the parameter vector, 



E[(^-a) 2 + (^-&) 2 -i? 2 ] 2 

4^- 1 E[(^-«) 2 + (^-^) 2 ] ' 




Expressing it in terms of A,B,C,D gives 
na\ -r _V" [Azi + Bxi + C yi + D} 2 



Zi Xi V\ 1 



(21) 



Z 



def 



'II 



1 
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the 'data matrix' (recall that z\ = x\ + yf) and 



(22) 



M^-Z T Z 

n 
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the 'matrix of moments'. All the algebraic circle fits minimize the same objec- 
tive function .F(A) = A T MA, cf. (TTTD, subject to a constraint A T NA = 1, 
where the matrix N corresponds to the fit. The Pratt fit uses 



(23) 



the Taubin fit uses 



N 



def 







-2 



-2 



1 




(24) 



N 



def 



Az 2x 2y 

2x 1 

2y 1 





def 



and the Kasa uses N = K =' e\ej , where ei = (1, 0, 0, 0) T . 

To solve the above constrained minimization problem one uses a Lagrange 
multiplier rj and reduces it to unconstrained minimization of the function 

(25) e(A,f|) = A T MA-f|(A T NA-l). 
Differentiating with respect to A gives 

(26) MA = r]NA, 

thus A must be a generalized eigenvector for the matrix pair (M, N). This 
fact is sufficient for the subsequent analysis, because it determines A up to 
a scalar multiple, and multiplying A by a scalar does not change the circle 
it represents, so we can set ||A|| = 1. 

Remark. The generalized eigenvalue problem (1261) may have several solu- 
tions. To choose the right one we note that for each solution (r), A) 

A T MA = r]A T NA = r], 

thus for the purpose of minimizing A T MA we should choose the solution of 
with the smallest positive rj. 
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5 Error Analysis: a general scheme 



We employ an error analysis scheme based on a 'small noise' assumption. 
That is, we assume that the errors Si and Si (Section [2]) are small and treat 
their standard deviation a as a small parameter. The sample size n is fixed, 
though it is not very small. 

This approach goes back to Kadane [16] and was employed by Anderson 
[2] and other statisticians [3J. More recently it has been used by Kanatani 
[T9l [22] in image processing applications, who argued that the 'small noise' 
model, where a — > while the sample size n is kept fixed, is more appropriate 
than the traditional statistical 'large sample' approach, where n — > oo while 
a > is kept fixed. We use a combination of these two models: our main 
assumption is a — > 0, but n is regarded as a slowly increasing parameter; 
more precisely we assume n <C ct~ 2 . 

Suppose one is fitting curves defined by an implicit equation 

(27) P(x,y;@)=0, 

where = (#i, . . . , 6k) T denotes a vector of unknown parameters to be esti- 
mated. Let = (#i, . . . , 6k) T be the 'true' parameter vector corresponding 
to the 'true' curve P(x,y; 0) = 0. As in Section [2] let (xi,yi), % — 1, . . . ,n, 
denote true points, which lie on the true curve, and (xi,yi) observed points 
satisfying ([T]). Let 0(xi, yi, . . . , x n , y n ) be an estimator. We assume that 
is a regular (at least four times differentiable) function of observations 
(xi,yi). The existence of the derivatives of is only required at the true 
points (xi,yi) = (xi,yi), and it follows from the implicit function theorem 
under general assumptions provided P(x, y\ 0) in fT2T|) is differentiable (in 
most cases P is a polynomial in all its variables); we omit the proof. 

For brevity we denote by X = (xi, y±, . . . , x n , y n ) T the vector of all ob- 
servations, so that X = X + E, where X = (xi,yi, . . . ,x n , y n ) T is the vector 
of the true coordinates and E = (S±, e±, . . . , S n , e n ) T is the 'noise vector'; 
the components of E are i.i.d. normal random variables with mean zero and 
variance cr 2 . 

We use Taylor expansion to the second order terms. To keep our notation 
simple, we work with each scalar parameter 6 m of the vector separately: 

(28) ro (X) = m (X) + G^E + | E T H m E + P (a 3 ). 

Here G m = V9 m and H m = V 2 9 m denote the gradient (the vector of the first 
order partial derivatives) and the Hessian matrix of the second order partial 
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derivatives of 9 m , respectively, taken at the true vector X. The remainder 
term Op(a 3 ) in (1281) is a random variable 1Z such that <j~ 3 7Z is bounded in 
probability. 

Expansion (128]) shows that ©(X) — > 0(X) in probability, as o — > 0. It is 
convenient to assume that 

(29) 0(X) = e. 

Precisely (1291) means that whenever a = 0, i.e. when the true points are 
observed without noise, then the estimator returns the true parameter vector, 
i.e. finds the true curve. Geometrically, it means that if there is a model curve 
that interpolates the data points, then the algorithm finds it. 

With some degree of informality, one can assert that whenever (1291) holds, 
the estimate is consistent in the limit o — > 0. This is regarded as a minimal 
requirement for any sensible fitting algorithm. For example, if the observed 
points lie on one circle, then every circle fitting algorithm finds that circle 
uniquely. Kanatani [20] remarks that algorithms which fail to follow this 
property "are not worth considering" . 

Under the assumption f[2"$j) we rewrite PS]) as 

(30) A0 m (X) = G^E + | E T H m E + O p {g z ), 

where A0 m (X) = m (X) — 9 m is the statistical error of the parameter esti- 
mate. 

The accuracy of an estimator 9 in statistics is characterized by its Mean 
Squared Error (MSE) 

(31) E [(0 - 6) 2 ] = Var(0) + [bias(0)] 2 , 

where bias(0) = E(0) — 9. But it often happens that exact (or even ap- 
proximate) values of E(0) and Var(0) are unavailable because the probability 
distribution of 9 is overly complicated, which is common in curve fitting 
problems, even if one fits straight lines to data points; see [21 [3]. There are 
also cases where the estimates have theoretically infinite moments because 
of somewhat heavy tails, which on the other hand barely affect their prac- 
tical performance. Thus their accuracy should not be characterized by the 
theoretical moments which happen to be affected by heavy tails; see also 
[2]. In all such cases one usually constructs a good approximate probabil- 
ity distribution for 9 and judges the quality of 9 by the moments of that 
distribution. 
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It is standard [H El El EE2J EI| to construct a normal approximation to 9 and 
treat its variance as an 'approximative' MSE of 9. The normal approximation 
is usually based on the leading term in the Taylor expansion, like G^E in 
(130]) . For circle fitting algorithms, the resulting variance (see below) will 
be the same for all known methods, so we will go one step further and use 
the second order term. This gives us a better approximative distribution 
and allows us to compare circle fitting methods. In our formulas, E(# m ) 
and Var(# m ) denote the mean and variance of the resulting approximative 
distribution. 

The first term in (130]) is a linear combination of i.i.d. normal random 
variables that have zero mean, hence it is itself a normal random variable with 
zero mean. The second term is a quadratic form of i.i.d. normal variables. 
Since H m is a symmetric matrix, we have H m = Q^D m Q m , where Q m is an 
orthogonal matrix and D m = diag{c?i, . . . , d,2n} is a diagonal matrix. The 
vector E m = Q m E has the same distribution as E does, i.e. its components 
are i.i.d. normal random variables with mean zero and variance a 2 . Thus 

(32) E T H m E = E^D m E m = a 2 ^ d t Z 2 , 

where the Zj's are i.i.d. standard normal random variables, and the mean 
value of (13"2"]) is 

(33) E(E T H m E) = (j 2 trD m = a 2 trH m . 
Therefore, taking the mean value in (130]) gives 

(34) bias(# m ) = E(A# m ) = \ a 2 tr H m + 0(a A ). 

Note that the expectations of all third order terms vanish, because the com- 
ponents of E are independent and their first and third moments are zero; 
thus the remainder term is of order cr 4 . 

Squaring (1311 and again using f[3"2"j) give the mean squared error (MSE) 

(35) E([A^ n ] 2 ) = ( x 2 G^G m + ia 4 ([trH m ] 2 + 2||H m || 2 ,)+^, 

where ||H m ||| 1 = trH 2 ^ is the Frobenius norm (note that ||H m |||, = ||D m |||, = 
trD^). The remainder TZ includes terms of order a 6 , as well as some terms 
of order cr 4 that contain third order partial derivatives, such as d 3 9 m /dx^ 
and d 3 9 m /dx 2 dxj. A similar expression can be derived for E(A6 l m A^ m ') for 
m ^ m', we omit it and only give the final formula below. 
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Classification of higher order terms. In the MSE expansion (j55j) . the 
leading term a 2 G^G m is the most significant. The terms of order cr 4 are 
often given by long complicated formulas. Even the expression for the bias 
(j34p may contain several terms of order a 2 , as we will see below. Fortunately, 
it is possible to sort them out keeping only the most significant ones, see next. 

Kanatani [22] recently derived formulas for the bias of certain ellipse 
fitting algorithms. First he found all the terms of order a 2 , but in the end 
he noticed that some terms were of order a 2 (independent of n), while the 
others of order a 2 /n. The magnitude of the former was clearly larger than 
that of the latter, and when Kanatani made his conclusions he ignored the 
terms of order a 2 /n. Here we formalize Kanatani's classification of higher 
order terms as follows: 

- In the expression for the bias (|34|) we keep terms of order a 2 (independent 
of n) and ignore terms of order a 2 /n. 

- In the expression for the mean squared error (135]) we keep terms of order 
a 4 (independent of n) and ignore terms of order a 4 /n. 

These rules agree with our assumption that not only a — > 0, but also n — > 
oo, although n increases rather slowly (n <C \ja 2 \ Such models were studied 
by Amemiya, Fuller and Wolter [lj [31] who made a more rigid assumption 
that n ~ a~ a for some < a < 2. 

Now it turns out (we omit detailed proofs; see [H]) that the main term 
cr 2 G^G m in our expression for the MSE (1331) is of order a 2 /n; so it will never 
be ignored. Of the fourth order terms, ~ o" 4 ||H m |||, is of order <r 4 /n, hence it 
will be discarded, and the same applies to all the terms involving third order 
partial derivatives mentioned above. 

The bias a 2 tr H m in flMl) is, generally, of order a 2 (independent of n) , 
thus its contribution to the mean squared error (13"5l) is significant. However 
the full expression for the bias may contain terms of order a 2 and of order 
<J 2 /n, of which the latter will be ignored; see below. 

Now the terms in ([35]) have the following orders of magnitude: 

(36) E([A9 m } 2 ) = 0{a 2 /n) + 0(a 4 ) + 0(a A /n) + 0(a 6 ), 

where each big-0 simply indicates the order of the corresponding term in 
( I35|) . It is interesting to roughly compare their values numerically. In typical 
computer vision applications, a does not exceed 0.05; see [Sj. The number of 
data points normally varies between 10-20 (on the low end) and a few hundred 
(on the high end). For simplicity, we can set n ~ l/cr for smaller samples 
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a 2 /n 


<x 4 


a 4 /n 


a 6 


small samples (n ~ 1/cr) 


cr 3 


cr 4 


cr 5 


cr 6 


large samples {n ~ 1/cr 2 ) 


cr 4 


a 4 


cr 6 


a 6 



Table 1: The order of magnitude of the four terms in fl35l) . 



and n ~ 1 /cr 2 for larger samples. Then Table [T] presents the corresponding 
typical magnitudes of each of the four terms in ( 1351) . 

We see that for larger samples the fourth order term coming from the 
bias may be just as big as the leading second-order term, hence it would be 
unwise to ignore it. Earlier studies, see e.g. [3 [8], [T7], usually focused on the 
leading, i.e. second-order, terms only, disregarding all the fourth-order terms, 
and this is where our analysis is different. We make one step further - we 
keep all the terms of order 0(a 2 /n) and 0(a 4 ). The less significant terms of 
order 0(a 4 /n) and C(cr 6 ) would be discarded. 

Now combining all our results gives a matrix formula for the (total) mean 
squared error (MSE) 



(37) E (A0)(A0) 



cr 2 GG T + cr 4 BB T + • • • 



where G is the k x 2n matrix of first order partial derivatives of ©(X), its 
rows are G^, 1 < m < k, and B = |[trHi, . . . trHfc] T is the fc-vector that 
represents the leading term of the bias of 0, cf. (13^1) . The trailing dots in 
(I3T|) stand for all insignificant terms (those of order cr 4 /n and cr 6 ). 

We call the first (main) term cr 2 GG T in (13 7p the variance term, as it 
characterizes the variance (more precisely, the covariance matrix) of the es- 
timator 0, to the leading order. For brevity we denote V = GG T . The 
second term cr 4 BB T is the 'tensor square' of the bias cr 2 B of the estimator, 
again to the leading order. When we deal with particular estimators in the 
next sections, we will see that the actual expression for the bias is a sum of 
terms of two types: some of them are of order 0(a 2 ) and some others are of 
order C(cr 2 /n), i.e. 

(38) E(A0) = cr 2 B + C(cr 4 ) = cr 2 B! + cr 2 B 2 + C(cr 4 ), 
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where Bx = 0(1) and B 2 = 0(l/n). We call <j 2 B 1 the essential bias of the 
estimator 0. This is its bias to the leading order, a 2 . The other terms, i.e. 
a 2 B 2 , and C(o" 4 ), constitute non-essential bias; they can be discarded. Now 
(1371) can be written as 



(39) 



E 



(A0)(A0) 



T 



a 2 V + cr 4 BiBf + 



where we only keep significant terms of order a 2 /n and <r 4 and drop the rest. 

KCR lower bound. The matrix V representing the leading terms of the 
variance has a natural lower bound (an analogue of the Cramer- Rao bound): 
for every curve family ff27j) there is a symmetric positive semi-definite matrix 
V min such that for every estimator satisfying (1291) 



(40) 



V > V 



IIP 'II 2 



in the sense that V — V min is a positive semi-definite matrix. Here 



P &1 = <9P(x i; 0)/00i, . . . , aP( Xi ; Q)/d6 h 



(41) 



stands for the gradient of P with respect to the model parameters 9\, 
and 



(42) 



dP(± i; &)/dx,dP(± i; @)/dy 



for the gradient with respect to the planar variables x and y; both gradients 
are taken at the true point 5q = (xi,yi). For example in the case of fitting 
circles defined by P = (x — a) 2 + (y — 6) 2 — R 2 , we have 



(43) P ei = -2((x i -a),(y i -b),R) 
Therefore, 



-Pv 



7\ \ T 1 



2((x i - a), (j/i - b)) 



(44) 
where 

(45) 



V min = (w T w)-\ 



W 



def 



Ml Vi 1 
Un 1 
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and Ui, Vi are given by (j3J. 

The general inequality (HOI) was proved by Kanatani [T71 [TS| for unbiased 
estimators and then extended by Chernov and Lesort [8] to all estimators 
satisfying (1291) . The geometric fit (which minimizes orthogonal distances) 
always satisfies ff29l) and attains the lower bound V m i n ; this was proved by 
Fuller (Theorem 3.2.1 in [T2]) and independently by Chernov and Lesort [8], 
who named the inequality (14"U1) Kanatani- Cramer- Rao (KCR) lower bound. 
See also survey [25J for the more general case of heteroscedastic noise. 

Assessing the quality of estimators. Our analysis dictates the following 
strategy of assessing the quality of an estimator 0: first of all, its accuracy is 
characterized by the matrix V, which must be compared to the KCR lower 
bound V m j n . We will see that for all the circle fitting algorithms the matrix 
V actually achieves its lower bound V min , i.e. we have V = V min , hence these 
algorithms are optimal to the leading order. 

Next, once the factor V is already at its natural minimum, the accu- 
racy of an estimator should be characterized by the vector Bi representing 
the essential bias - better estimates should have smaller essential biases. It 
appears that there is no natural minimum for ||Bi||, in fact there exist esti- 
mators which have a minimum variance V = V m j n and a zero essential bias, 
i.e. Bi = 0. We will construct such an estimator in Section [3 

6 Error analysis of geometric circle fit 

Here we apply the general method of the previous section to the geometric 
circle fit, i.e. to the estimator = (a, b, R) of the circle parameters minimiz- 
ing the sum ^ df of orthogonal (geometric) distances from the data points 
to the fitted circle. 

Variance of the geometric circle fit. We start with the main part of our 
error analysis - the variance term represented by cr 2 V in fl39l) . The distances 
di = ri — R can be expanded as 

di = yj + 5~) - (a + Aa)] 2 + [(y t + e t ) - (b + A6)] 2 -R-AR 

= \J R 2 + 2Rui{5i - Aa) + IRv^Ei - Ab) + P {a 2 ) -R-AR 
(46) = u t (5 t - Aa) + v t (e t - Ab) -AR + P (a 2 ), 
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see (|3]) . Minimizing ^ df to the first order is equivalent to minimizing 

(47) ^(u* Aa + iJj A5 + AR - uA - niSif . 

This is a classical least squares problem that can also be written as 

(48) WA0siW + V£, 

where W is given by (jl5|) . © = (a, b, R) T , as well as d = (Si, . . . , <5 n ) T and 
e = (ei, . . . ,e n ) T , while U = diag(«i, . . . ,u n ) and V = diag(wi, . . . , v n ). 
The solution of the least squares problem fl4"8"j) is 

(49) AO = (W T W)- 1 W T (U5 + Ve), 

of course this does not include the Op(a 2 ) terms. Thus the variance of our 

estimator, to the leading order, is 

(50) 

E[(A0)(A0) T ] = (W T W)" 1 W T E[(U(5 + V£)(5 T U + e T V)]W(W T W)" 1 . 

Now observe that E(6e T ) = E(e8 T ) = 0, as well as E(66 T ) = E(ee T ) = cx 2 I, 
and we have U 2 + V 2 = I. Thus to the leading order 

(51) E[(A0)(A0) T ] = a 2 (W T W)- 1 W T W(W T W)- 1 = a 2 (W T W)-\ 

where the higher order (of a 4 ) terms are not included. Comparing this to 
( 144)) confirms that the geometric fit attains the minimal possible covariance 
matrix V. 

Bias of the geometric circle fit. Now we do a second-order error analysis, 
which has not been previously done in the literature. According to a general 
formula we put 

a = d + Axa + A 2 a + O p (<t 3 ), 

(52) b = b + A 1 b + A 2 b + P (a 3 ), 

R = R + AiR + A 2 R + P (a 3 ). 

Here Axa, A±b, AiR are linear combinations of £j's and 5^s, which were 
found above, in (H9|) . and A 2 a, A 2 b, A 2 R are quadratic forms of £j's and S^s 
to be determined next. 
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Expanding the distances di to the second order terms gives 

di = Ui{5i - Aia) + Vi(si - Aife) - A X R 

- in A 2 a - vt A 2 b - A 2 R + A.a) 2 + - A,b) 2 

(53) -^f (6 i -A 1 a){e i -A 1 b). 

Since we already found Aid, Ai&, AiR, the only unknowns are A 2 a, A 2 b, 
A 2 R. Minimizing d 2 is now equivalent to minimizing 

(54) ^(u. t A 2 a + ^A 2 6 + A 2 R-f % ) 2 , 
where 

S% = Ui(5i - Aid) + Vi(ei - Aib) - Aii? 

(55) + ^(6 i -A l a) 2 + ^{e i -A l b) 2 -^(6 i -A 1 a)(e i -A 1 b). 

This is another least squares problem, and its solution is 

(56) A 2 = (W T W)" 1 W T F, 

where F = (/i, . . . , /„) T ; of course this is a quadratic approximation which 
does not include Op(a 3 ) terms. In fact, the contribution from the first three 
(linear) terms in (155]) vanishes, quite predictably; thus only the last two 
(quadratic) terms matter. 

Taking the mean value gives, to the leading order, 

2 

(57) E(A0) = E(A 2 0) = ^ [(W T W)" 1 W T 1 + (W T W)" 1 W T S] , 

where 1 = (1, 1, ... , 1) T and S = (si, . . . , s n ) T , here Sj is a scalar 

(58) Sl = [-Vi, M i ,0](W T W)- 1 [-C i ,M i ,0] T 

The second term in (!57l) is of order 0(a 2 /n), thus the essential bias is 
given by the first term only, and it can be simplified. Since the last column of 
the matrix W T W coincides with the vector W T 1, we have (W T W) _1 W T 1 = 
[0, 0, 1] T , hence the essential bias of the geometric circle fit is 

(59) E(A0)^ s i[O,O,lf. 

Art 
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Thus the estimates of the circle center, a and b, have no essential bias, while 
the estimate of the radius has essential bias 

(60) E(AR) = 4, 

which is independent of the number and location of the true points. These 
facts are consistent with the results obtained by Berman [5] under the as- 
sumptions that a > is fixed and n — > oo. 



7 Error analysis of algebraic circle fits 

Here we analyze algebraic circle fits using their matrix representation. 

Matrix perturbation method. For every random variable, matrix or 
vector, L, we write 

(61) L = L + A X L + A 2 L + £> P (a 3 ), 

where L is its 'true', nonrandom, value (achieved when o = 0), AiL is a linear 
combination of S^s and £j's, and A 2 L is a quadratic form of S^s and £j's; all 
the higher order terms (cubic etc.) are represented by Op(o~ 3 ). For brevity, 
we drop the Op(a 3 ) terms in our formulas. Therefore A = A + A]A + A 2 A 
and M = M + AiM + A 2 M, and (J22D implies 

(62) A 1 M = n~ 1 (Z T A 1 Z + A^Z), 

(63) A 2 M = rT x { A x Z t AiZ + Z T A 2 Z + A 2 Z T Z). 

Since the true points lie on the true circle, ZA = 0, as well as MA = 
(hence M is a singular matrix). Therefore 

(64) A T A 1 MA = n- 1 A T (Z T A 1 Z + AxZ T Z)A = 0, 

hence A T MA = Op(a 2 ), and premultiplying (1261) by A T yields r] = Op(a 2 ). 
Next, substituting the expansions of M, A, and N into fl26l) gives 

(65) (M + AiM + A 2 M)(A + AjA + A 2 A) = r/NA 

(recall that N is data-dependent for the Taubin method, but only its 'true' 
value N matters, as rj = Op(o~ 2 ), hence the use of the observed values only 
adds higher order terms). Now using MA = yields 

(66) (M Ax A + AiM A) + (M A 2 A + A X M Ax A + A 2 M A) = r]NA 
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The left hand side of (1661) consists of a linear part (M A X A + AiMA) and 
a quadratic part (M A 2 A + A%M. A X A + A 2 M A). Separating them gives 

(67) MA 1 A + n~ 1 Z T A 1 ZA = 
(where we used ( 1621) and ZA = 0) and 

(68) MA 2 A + A1MA1A + A 2 MA = r]NA. 

Note that M is a singular matrix (because MA = 0), but whenever there are 
at least three distinct true points, they determine a unique true circle, thus 
the kernel of M is one-dimensional, and it coincides with span(A). Also, we 
set ||A|| = 1, hence AiA is orthogonal to A, and we can write 



(69) 



AiA = -n^rZ^AjZA, 



where M denotes the Moore-Penrose pseudo inverse. Now one can easily 
check that E(AiMAiA) = 0(a 2 /n) and E(A X A) = 0; these facts will be 
useful in the upcoming analysis. 

Variance of algebraic circle fits. From (|69|) we conclude that 

^ 2 M~E(Z T AiZ AA T AiZ T Z)M _ 



E[(A 1 A)(A 1 A) T ] 



(70) 
where 

(71) 



n 



n 



j 



M 



X 1 

m 
1 



and 



AiZj 



2xiSi + 2yiEi 
Si 

Si 





denote the columns of the matrices Z T and AiZ T , respectively. Next, 



(72) 
where 

(73) 



E[(A 1 Z 2 ,)(A 1 Z i ) T ] 







whenever 
whenever 



i = j 



rp def 



Azi 2xi 2yi 
2xi 10 

2m 10 
0000 
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Note n" 1 J2 Ti = f and A T t,A = A T PA = B 2 + (7 2 - 4ll) for each i; 
recall (E3D and (El. Hence 



(74) ^ ZjA T TjAZf = ^(A T PA)Z,Zf = n(A T PA)M. 

Combining the above formulas gives 



E[(AiA)(AiA) T ] =n- 2 M-|^Z i A T E(A 1 ZfA 1 Zj)AZ 



T 
3 

1,3 



M 



VlVT Zi A T f i AZ 

(75) = n-V 2 M-(A T PA). 

Remarkably, the variance of algebraic fits does not depend on the constraint 
matrix N, hence all algebraic fits have the same variance (to the leading 
order). In the next section we will derive the variance of algebraic fits in the 
natural circle parameters (a, b, R) and see that it coincides with the variance 
of the geometric fit fl5T|) . 

Bias of algebraic circle fits. Since E(AiA) = 0, it will be enough to find 
E(A 2 A). Premultiplying ([68]) by A T yields 

(7*\ ATMA A^MA + A^MAjA 

(76) V=— = = — — VOpia n). 

V ' A T NA A T NA V ' ' 

Recall that E(A 1 MA 1 A) = 0(a 2 /n), thus this term will not affect the 
essential bias and we drop it. Taking the mean value and using ( 1691) gives 

( 77) E( „ ) = A^M)A 

A T NA 



We substitute ( 1631) into (ITB"]) . use ZA = 0, then observe that 

(78) E(A X Z T AiZA) = a 2 T*A = 2Aa 2 ^ % + na 2 PA 

(here A is the first component of the vector A). Then, note that A 2 Zj 
(5 2 + e 2 , 0, 0, 0) T , and so 

(79) E(Z T A 2 Z)A = 2Ax 2 ^Z;. 
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Therefore the essential bias is given by 



(80) E(A 2 A) = -ct 2 M~ 

L ^ A T NA 

A more detailed analysis (which we omit) gives the following expression con- 
taining all the 0(a 2 ) and 0(a 2 /n) terms: 



■ ~ ~ A T PA 

AAn- 1 V Zj + PA - , , . NA 

A TTVT A 



E 



(1 _ 1 A PA ~ - 

- -NA 

A T NA 



(A 2 A) = -a 2 M- [AAn- 1 Zi + (1 - J)PA 
(81) - 4in" 2 ^(ZfM-Zi)Zi] + 0(A 

This expression demonstrates that the terms of order a 2 /n (the non-essential 
bias) only add a small correction, which is negligible when n is large. 

The expressions flHUl) and flHTl) can be simplified. Note that the vector 
n -1 ^2 Zj coincides with the last column of the matrix M, hence 



(82) 



4irT 1 Zi] = -4a 2 i [0, 0, 0, 1] T . 



In fact, this term will play the key role in the subsequent analysis. 



8 Comparison of various circle fits 

Bias of the Pratt and Taubin fits. We have seen that all the algebraic fits 
have the same main characteristic - the variance f!75|) . to the leading order. 
We will see below that their variance coincides with that of the geometric 
circle fit. Thus the difference between all our circle fits should be traced to 
the higher order terms, especially to their essential biases. 

First we compare the Pratt and Taubin fits. For the Pratt fit, the con- 
straint matrix is N = N = P, hence its essential bias ( |80|) becomes 

(83) E(A 2 A Pratt ) = -4a 2 A [0, 0, 0, 1] T . 

In other words, the Pratt constraint N = P cancels the second (middle) term 
in ( IHUl) ; it leaves the first term intact. 

For the Taubin fit, the constraint matrix is N = T and its 'true' value is 
N = f = i J2 Tij also note that T^A = 2AZi + PA for every i. Hence the 
Taubin's bias is 

(84) E(A 2 A Taubin ) ^ -2a 2 A [0, 0, 0, 1] T . 
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Thus, the Taubin constraint N = T cancels the second term in (I8"0j) and a 
half of the first term; it leaves only a half of the first term in place. 

As a result, the Taubin fit's essential bias is twice as small as that of the 
Pratt fit. Given that their main terms (variances) are equal, we see that the 
Taubin fit is statistically more accurate than that of Pratt. We believe our 
analysis answers the question posed by Taubin [52] who intended to compare 
his fit to Pratt's. 

'Hyperaccurate' algebraic fit. Our error analysis leads to another stun- 
ning discovery - an algebraic fit that has no essential bias at all. To our 
knowledge, this is the first such algorithm for curve fitting problems. 
Let us set the constraint matrix to 



S5) N = H d = 2T 



Then one can easily see that HA = AA \ s ^ J T*i J r PA, as well as A T HA = 
A T PA, hence all the terms in flHUj) cancel out! The resulting essential bias 
vanishes: 

(86) E(A 2 A Hypcr ) = 0. 

We call this fit hyperaccurate, or 'Hyper' for short. The term hyperaccuracy 
was introduced by Kanatani [2~Tj l2"2] who was first to employ Taylor expansion 
up to the terms of order a 4 for the purpose of comparing various algebraic 
fits and designing better fits. 

We note that the Hyper fit is invariant under translations and rotations 
because its constraint matrix H is a linear combination of two others, T and 
P, that satisfy the invariance requirements; see a proof in [TJ]. 

As any other algebraic circle fit, the Hyper fit minimizes the function 
JF(A) = A T MA subject to the constraint A r NA = 1 (with N = H), hence 
we need to solve the generalized eigenvalue problem MA = r/HA and choose 
the solution with the smallest positive eigenvalue rj (see the end of Section H]). 

The matrix H is not singular, three of its eigenvalues are positive and 
one is negative (these facts can be easily derived from the following simple 
observations: detH = —4, trace H = 8z + 2 > 1, and A = 1 is one of its 
eigenvalues). Assume that M is positive definite, then by Sylvester's law 
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of inertia, the matrix H _1 M has the same signature as H does, i.e. the 
eigenvalues 77 of H _1 M are all real, exactly three of them are positive and 
one is negative. The eigenpair (77, A) with the negative eigenvalue 77 does not 
represent any circle [TJ], so it is useless. (We note that Pratt's fit has similar 
properties, as det P = —4.) The eigenpair with the smallest positive 77 gives 
the best fit. Lastly, the matrix M is singular if and only if the observed 
points lie on a circle (or a line), in this case the eigenvector A corresponding 
to 77 = gives the interpolating circle (line). 

The Hyper fit can be computed by a numerically stable procedure involv- 
ing singular value decomposition (SVD). First, we compute the (short) SVD, 
Z = USV T , of the matrix Z. If its smallest singular value, 04, is less than a 
predefined tolerance e (we suggest e = 10 -12 ), then A is the corresponding 
right singular vector, i.e. the fourth column of the V matrix. In the regu- 
lar case (04 > e), one forms Y = VSV T and finds the eigenpairs of the 
symmetric matrix YH ^Y. Selecting the eigenpair (77, A*) with the smallest 
positive eigenvalue and computing A = Y^A* completes the solution. The 
prior translation of the coordinate system to the centroid of the data set 
(which ensures that x — y — 0) makes the computation of H _1 particularly 
simple. The corresponding MATLAB code is available from our web page 



Transition between parameter schemes. Our next goal is to express the 
covariance and the essential bias of the algebraic circle fits in terms of the 
natural parameters © = (a, b, R) T . Taking partial derivatives in (|12p gives a 
3x4 'Jacobian' matrix 



17) J = 







B 2 _ J_ 

2A 2 2A 

it- ^ 

R D B C 1 



A 2A 2 R 4A 2 R 4A 2 R 2AR -I 



Thus we have 

(88) AxG^AiA and A 2 © = J A 2 A + Op(a 2 /n), 

where J denotes the matrix J at the true parameters (A,B,C,D). The 
remainder term Op(a 2 /n) comes from the second order partial derivatives, 
for example 

(89) A 2 a = (Va) T (A 2 A) + i(A 1 A) T (V 2 a)(A 1 A), 
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where V 2 a is the Hessian matrix of the second order partial derivatives of a 
with respect to (A, B, C, D). The last term in (189"]) can be actually discarded, 
as it is of order Op(a 2 /n) because AiA = Opioj^pn). We collect all such 
terms in the remainder term Op(a 2 /n) in flHHl) . 

Next we need a useful fact. Suppose a point (x , yo) lies on the true circle 
(a, b, R), i.e. 

(90) (x -a) 2 + (y -b) 2 = R 2 . 

In accordance with our early notation we denote zq = Xq + y 2 and Zq = 
(z , x , y , 1) T . We also put u = (x — a)/R and v = (y — b)/R, and 
consider the vector W = (u ,v , 1) T . The following formula will be useful: 

(91) 2ARJ]VrZo = -n(W T W)" 1 W , 

where the matrix (W T W) _1 appears in (1571) and the matrix M~ appears in 
( 1751) . The identity (1911) is easy to verify directly for the unit circle a = b = 
and R — 1, and then one can check that it remains valid under translations 
and similarities. 

Equation (19T|) implies that for every true point (xi,yi) 

(92) 4i 2 # 2 JM"Z;ZfM- J T = n 2 (W T W)" 1 W i 'Wf (^W) -1 , 

where W, = (Hi, Vi, 1) T denote the columns of the matrix W, cf. ( )45l) . Sum- 
ming up over i gives 

(93) 4i 2 ,R 2 JM- J T = n(W T W)" 1 . 

Variance and bias of algebraic circle fits in the natural parameters. 

Now we can compute the variance (to the leading order) of the algebraic fits 
in the natural geometric parameters. Notice that the third relation in (Tl2l 
implies A T PA = B 2 + C 2 - AAD = AA 2 R 2 . Thus using ([93]) gives 

E[(A 1 0)(A 1 0) T ] = E[J(A 1 A)(A 1 A) T J T ] 

= n-V(A T PA)(JM-J T ) 
= a 2 {AA 2 R 2 n- l JM-J T ) 

(94) = o- 2 (W T W) _1 . 

Thus the variance of all the algebraic circle fits (to the leading order) coincides 
with that of the geometric circle fit, cf. ( |5Tl) . Therefore the difference between 
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all the circle fits should be then characterized in terms of their biases, which 
we do next. 

The essential bias of the Pratt fit is, due to fl83l) . 

(95) E(A 2 Pratt ) = 2a 2 iT 1 [0, 0, if. 

Observe that the estimates of the circle center are essentially unbiased, and 
the essential bias of the radius estimate is 2a 2 /R, which is independent of 
the number and location of the true points. We know that the essential bias 
of the Taubin fit is twice as small, hence 

(96) E(A 2 €) Taubin ) = a 2 BT l [0, 0, if. 

Comparing to fl59|) shows that the geometric fit has an essential bias that is 
twice as small as that of Taubin and four times smaller than that of Pratt. 
Therefore, the geometric fit has the smallest bias among all the popular circle 
fits, i.e. it is statistically most accurate. 

The formulas for the bias of the Kasa fit can be derived, too, but in general 
they are complicated. However recall that all our fits, including Kasa, are 
independent of the choice of the coordinate system, hence we can choose it 
so that the true circle has center at (0, 0) and radius R = 1. For this circle 
A = ^= [1,0,0, -1] T , hence PA = 2A and so M PA = 0, i.e. the middle 

term in (|80|) is gone. Also note that A T PA = 2, hence the last term in 
parentheses in QgQ) is 2^2 [1, 0, 0, 0] T . 




Figure 1: The arc containing the true points. 

Next, assume for simplicity that the true points are equally spaced on 
an arc of size 9 (a typical arrangement in many studies). Choosing the 
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total MSE 


= variance + 


(ess. bias) 2 + rest of MSE 


Pratt 


1.5164 


1.2647 


0.2500 


0.0017 


Taubin 


1.3451 


1.2647 


0.0625 


0.0117 


Geom. 


1.2952 


1.2647 


0.0156 


0.0149 


Hyper. 


1.2892 


1.2647 


0.0000 


0.0244 



Table 2: Mean square error (and its components) for four circle fits 
(10 4 xvalues are shown). In this test n = 100 points are placed (equally 
spaced) along a semicircle of radius R = 1 and the noise level is o = 0.05. 

coordinate system so that the east pole (1,0) is at the center of that arc (see 
Figured]) ensures y — xy — 0. It is not hard to see now that 

(97) rvr[l,0,0,0] T = \{xx - x 2 ) _1 [xx, -2x,0,xx] T '. 

Using the formula (188p we obtain (omitting details as they are not so relevant) 
the essential bias of the Kasa fit in the natural parameters (a, b, R): 

2 

(98) E(A 2 e Ka sa) C =2 C T 2 [0,0,l] T - _ a _ _ 2 [-x,0,xxf. 

The first term here is the same as in ( 1951) (recall that R = 1), but it is the 
second term above that causes serious trouble: it grows to infinity because 
xx — x 2 — > as 9 — > 0. This explains why the Kasa fit develops a heavy bias 
toward smaller circles when data points are sampled from a small arc. 

9 Experimental tests and conclusions 

To illustrate our analysis of various circle fits we have run a few computer 
experiments where we set n true points equally spaced along a semicircle of 
radius R — 1. Then we generated random samples by adding a Gaussian 
noise at level a = 0.05 to each true point, and after that applied various 
circle fits to estimate the parameters (a, b, R). 

Table [2] summarizes the results of the first test, with n = 100 points; it 
shows the mean square error (MSE) of the radius estimate R for each circle 
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total MSE 


= variance + 


(ess. bias) 2 + rest of MSE 


Pratt 


25.5520 


1.3197 


25.0000 


-0.76784 


Taubin 


7.4385 


1.3197 


6.2500 


-0.13126 


Geom. 


2.8635 


1.3197 


1.5625 


-0.01876 


Hyper. 


1.3482 


1.3197 


0.0000 


-0.02844 



Table 3: Mean square error (and its components) for four circle fits 
(10 6 x values are shown). In this test n = 10000 points are placed (equally 
spaced) along a semicircle of radius R = 1 and the noise level is o = 0.05. 



fit (obtained by averaging over 10 7 randomly generated samples). The table 
also gives the breakdown of the MSE into three components. The first two 
are the variance (to the leading order) and the square of the essential bias, 
both computed according to our theoretical formulas. These two components 
do not account for the entire mean square error, due to higher order terms 
which our analysis discarded. The remaining part of the MSE is shown in 
the last column, which is relatively small. (We note that only the total MSE 
can be observed in practice; all the other columns of this table are the results 
of our theoretical analysis.) 

We see that all the circle fits have the same (leading) variance, which 
accounts for the 'bulk' of the MSE. Their essential bias is different, it is 
highest for the Pratt fit and smallest (zero) for the Hyper fit. Algorithms 
with smaller essential biases perform overall better, i.e. have smaller mean 
square error. The Hyper fit is the best in our experiment; it outperforms the 
(usually unbeatable) geometric fit. 

To highlight the superiority of the Hyper fit, we repeated our experiment 
increasing the sample up to n = 10000, see Table [3] and Figure [2j We see 
that when the number of points is high, the the Hyper fit becomes several 
times more accurate than the geometric fit. Thus, our analysis disproves the 
popular belief in the statistical community that there is nothing better than 
minimizing the orthogonal distances. 

Needless to say, the geometric fit involves iterative approximations, which 
are computationally intensive and subject to occasional divergence, while our 
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Hyper fit is a fast non-iterative procedure, which is 100% reliable. 




1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 



Figure 2: MSE for various circle fits (on the logarithmic scale) versus the 
sample size n (from 10 to 10 4 ). 

Summary. All the known circle fits (geometric and algebraic) have the same 
variance, to the leading order. The relative difference between them can be 
traced to higher order terms in the expansion for the mean square error. The 
second leading term in that expansion is the essential bias, for which we have 
derived explicit expressions. Circle fits with smaller essential bias perform 
better overall. This explains a poor performance of the Kasa fit, a moderate 
performance of the Pratt fit, and a good performance of the Taubin and 
geometric fits (in this order). We showed that while there is a natural lower 
bound on the variance to the leading order (the KCR bound), there is no 
lower bound on the essential bias. In fact there exists an algebraic fit with 
zero essential bias (the Hyper fit), which outperforms the geometric fit in 
accuracy. We plan to perform a similar analysis for ellipse fitting algorithms 
in the near future. 

The authors are grateful to the anonymous referees for many helpful sug- 
gestions. N.C. was partially supported by National Science Foundation, grant 
DMS-0652896. 
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